Space Missions Dataset¶

Introduction¶

The Datascience project is to find a Dataset to one's liking and from it develop different analyses from which to draw useful considerations.

The Dataset I have chosen contains all space missions recorded from 1957 through mid-2020. Within it is the most important information for each mission, including the launch site, date and time, name of the company responsible for the mission, mission status (whether successful or failed), and mission cost.

In [1]:
import numpy as np
np.set_printoptions(precision=2)
import plotly.express as px
import plotly.graph_objs as go
import pandas as pd
import country_converter as coco
from plotly.subplots import make_subplots

df=pd.read_csv("data/Space_Corrected.csv", keep_default_na=False)
In [8]:
df['Date'] = df['Datum'].str.findall(r'(\d+(?:\.\d+)?)')
df['year'] = pd.to_numeric(df['Date'].str[1], errors='coerce')

df['cost'] = pd.to_numeric(df[' Rocket'], errors='coerce')

df["status"] = "Failure"
df.loc[df["Status Mission"] == "Success", "status"] = "Success"

df['country'] = df['Location'].str.split(', ').str[-1]

cc = coco.CountryConverter()
df['iso_country'] = cc.pandas_convert(df['country'], to='ISO3', not_found=None)
Shahrud Missile Test Site not found in regex
New Mexico not found in regex
Yellow Sea not found in regex
Pacific Missile Range Facility not found in regex
Pacific Ocean not found in regex
Barents Sea not found in regex
Gran Canaria not found in regex

Graph 1¶

Number of missions for each year.

This graph shows the development of space research over the years so that it is possible to see whether there is growing interest or not.

In [3]:
nYears = len(df.groupby(['year']))
countYears = df['year'].value_counts().sort_index()

shape = ['',] * nYears
shape[nYears-1] = '/'

fig = go.Figure(data=[go.Bar(
    x=countYears.index,
    y=countYears,
    marker_pattern_shape=shape
)])
fig.layout["title"] = "Annual number of missions"
fig.layout["xaxis"]["title"] = "Year"
fig.layout["yaxis"]["title"] = "Number of missions"
fig.update_layout(
    height=500)
fig

Graph 2¶

Average expenditure for each year.

With the following graph we expect to derive an analysis related to space mission expenditures over the years, this allows us to understand whether as interest increases so does spending. In this case, unfortunately, not all missions have a recorded cost, so the expected results would not be perfectly accurate.

In [4]:
yearCost = df.groupby('year')['cost'].mean().dropna()
nYears = len(yearCost)

shape = ['',] * nYears
shape[nYears-1] = '/'

fig = go.Figure(data=[go.Bar(
    x=yearCost.index,
    y=yearCost,
    marker_pattern_shape=shape
)])
fig.layout["title"] = "Average annual expenditure"
fig.layout["xaxis"]["title"] = "Year"
fig.layout["yaxis"]["title"] = "Cost of missions in $ million"
fig

Graph 3¶

Average cost for missions over the years of major space agencies.

With this graph we highlight the agencies that have aimed to invest the most during their lifetime, in this case, similar to the previous analysis, only those agencies whose mission cost data are available are selected.

In [5]:
companyCost = df.groupby('Company Name')['cost'].mean().dropna()

fig = px.bar(df, x=companyCost.index, y=companyCost, height=500, width=1000).update_xaxes(categoryorder="total descending")
fig.layout["title"] = "Average cost to agencies over the years"
fig.layout["xaxis"]["title"] = "Company"
fig.layout["yaxis"]["title"] = "Average cost during the years in $ million"
fig

Graph 4¶

Graph 4.a¶

Frequency percentage of different mission statuses.

Four different types of mission status were marked in this Dataset, of which almost a good 90 percent are only the successful missions. The other three states represent subsets of the failed missions, from this finding for subsequent analysis I preferred to group the failures under a single category because of their very small size.

Thanks to this graph it is also possible to deduce that although space missions possess a high hazard risk, the probability of failure is relatively lower than might be expected.

In [6]:
countStatues = df['Status Mission'].value_counts().sort_index()

fig = px.pie(df, values=countStatues, names=countStatues.index, title='Percentage distribution of mission status', height=500, width=500)
fig.update_traces(textinfo='percent+label', showlegend=False)
fig

Graph 4.b¶

Number of successful and failed missions over the years.

The following graph represents the number of missions in the two main satates (successful and failures) for the different years of operation, it suggests that with the passage of time there has been a clear improvement in the effectiveness of missions, thus also reducing the risk of failure. In recent years we denote a slight increase in failures, this is also caused by the arrival of the first private companies in the space field, companies with no experience and who had several initial problems.

In [9]:
countYears = df.groupby(['year', 'status']).size().reset_index()

fig = px.bar(countYears, orientation="v", x="year", y=0, color="status", barmode="group", height=400, width=1300,
            category_orders={"status": ["Success", "Failure"]
})

fig.layout["title"] = "Successful and failed missions over the years."
fig.layout["xaxis"]["title"] = "Year"
fig.layout["yaxis"]["title"] = "Number of missions"
fig.layout["legend"]["title"]["text"] = "Status"
fig

Graph 5.a e 5.b¶

Number of successful and failed missions by agency compared with the percentage of the two states by agency.

With this type of chart, it is possible to directly compare two charts with the same category of data. In this way it is easy to deduce which agencies are more efficient in performing missions successfully, which is also highly variable by the number of missions performed, but in this case also facially comparable. For these graphs I decided to pull from among the most prominent agencies, so that I could evaluate only the performance of those with the most prominence. Among the government agencies I added the two earliest and best known pivate agencies in the field, SpaceX and Blue Origin.

In [10]:
companyMission = df.loc[(df["Company Name"] == "NASA") | (df["Company Name"] == "ESA") | (df["Company Name"] == "SpaceX") | (df["Company Name"] == "RVSN USSR") | (df["Company Name"] == "Boeing") | (df["Company Name"] == "US Air Force") | (df["Company Name"] == "ASI") | (df["Company Name"] == "Blue Origin") | (df["Company Name"] == "JAXA") | (df["Company Name"] == "Roscosmos") | (df["Company Name"] == "CASC"), ["Company Name", "status"]]


# serie ordinata in base alle agenzie
orderedCompanyMission = companyMission.groupby(['Company Name']).size().sort_values(0, ascending=False)

# serie ordinata di solo missioni eseguite con successo
successMission = companyMission.loc[companyMission["status"] == "Success", ["Company Name"]]
successMission = successMission.groupby(['Company Name']).size().sort_values(0, ascending=False)

# serie ordinata di solo missioni fallite
failureMission = companyMission.loc[companyMission["status"] == "Failure", ["Company Name"]]
failureMission = failureMission.groupby(['Company Name']).size().sort_values(0, ascending=False)

# dataframe completo ordinato per agenzie con colonne relative al totale delle missioni, quelle eseguite con successo e quelle fallite
result = pd.concat([orderedCompanyMission, successMission, failureMission], axis=1)

color1 = [px.colors.qualitative.Plotly[2]] * len(orderedCompanyMission)
color2 = [px.colors.qualitative.Plotly[1]] * len(orderedCompanyMission)

fig = make_subplots(shared_xaxes=True, vertical_spacing=0.02, rows=2, cols=1)

# Subplot 1 - Totale missioni per agenzia
fig.add_trace(
    go.Bar(
        name="Total",
        x=result.index,
        y=result[0],
        offsetgroup=0,
    ),
    row=1,
    col=1,
)

# Subplot 2 - Percentuale stato missioni per agenzia
fig.add_trace(
    go.Bar(
        name="Success",
        x=result.index,
        y=result[1]/result[0]*100,
        offsetgroup=0,
        marker_color=color1
    ),
    row=2,
    col=1,
)
fig.add_trace(
    go.Bar(
        name="Failure",
        x=result.index,
        y=result[2]/result[0]*100,
        offsetgroup=0,
        base=result[1]/result[0]*100,
        marker_color=color2
    ),
    row=2,
    col=1,
)

fig.update_xaxes(title_text="Company", row=2, col=1)
fig.update_yaxes(title_text="Number of missions", row=1, col=1)
fig.update_yaxes(title_text="Mission percentage", row=2, col=1)
fig.update_layout(title_text="Relationship between missions and agencies", height=700)
fig.layout["legend"]["title"]["text"] = "Legend"

fig.show()
C:\Users\davis\AppData\Local\Temp\ipykernel_9084\3799321986.py:5: FutureWarning:

In a future version of pandas all arguments of Series.sort_values will be keyword-only.

C:\Users\davis\AppData\Local\Temp\ipykernel_9084\3799321986.py:9: FutureWarning:

In a future version of pandas all arguments of Series.sort_values will be keyword-only.

C:\Users\davis\AppData\Local\Temp\ipykernel_9084\3799321986.py:13: FutureWarning:

In a future version of pandas all arguments of Series.sort_values will be keyword-only.

Graph 6¶

Mission cost compared with status, successful or failed mission.

From this graph it can be seen that most failed missions are clustered in the lower cost part, leaving out a few exceptions. While for successful missions they are distributed more toward the center, suggesting that higher expenditure increases the probability of mission success.

In [11]:
fig = px.box(df, orientation="h", y="status", x="cost", height=500, width=900)

fig.layout["title"] = "Cost of missions based on success"
fig.layout["xaxis"]["title"] = "Cost in $ million"
fig.layout["yaxis"]["title"] = "Status"
fig.show()

Graph 7¶

Geographic map showing the nations from which the missions were carried out.

This analysis includes the fact that some nations design their missions in collaboration with others, from this it also defines the location of the launch site, such as the Italian and German agencies relying on France, where there is one of the ESA (European Space Agency) launch sites. Among them also the U.S. itself, in the last years before the advent of SpaceX, after the retirement of the Space Shuttle relied on Russia to launch their missions.

In [12]:
countCountry = df['iso_country'].value_counts().sort_index()

fig = px.choropleth(df, locations=countCountry.index,
                    color=countCountry,
                    color_continuous_scale=px.colors.sequential.Plasma,
                    height=600,
                    width=1000)
fig.update_layout(
    title_text='Map showing the nations from which the launches were made'
)
fig.layout.coloraxis.colorbar.title = 'Number of launches'
fig.show()